Goto

Collaborating Authors

 optimal transport map


Acceleration via silver stepsize on Riemannian manifolds with applications to Wasserstein space

Neural Information Processing Systems

There is extensive literature on accelerating first-order optimization methods in an Euclidean setting. Under which conditions such acceleration is feasible in Riemannian optimization problems is an active area of research. Motivated by the recent success of silver stepsize methods in the Euclidean setting, we undertake a study of such algorithms in the Riemannian setting. We provide the new class of algorithms determined by the choice of vector transport that allows the silver stepsize acceleration on Riemannian manifolds for the function classes associated with the corresponding vector transport. As a core application, we show that our algorithm recovers the standard Wasserstein gradient descent on the 2-Wasserstein space and, as a result, provides the first provable accelerated gradient method for potential functional optimization problems in the Wasserstein space.


Stability and Oracle Inequalities for Optimal Transport Maps between General Distributions

Neural Information Processing Systems

Optimal transport (OT) provides a powerful framework for comparing and transforming probability distributions, with wide applications in generative modeling, AI4Science and statistical inference. However, existing estimation theory typically requires stringent smoothness conditions on the underlying Brenier potentials and assumes bounded distribution supports, limiting practical applicability. In this paper, we introduce a unified theoretical framework for semi-dual OT map estimation that relaxes both of these restrictions. Building on sieved convex conjugate, our framework has two key contributions: (i) a new map stability bounds that holds without any second-order regularity assumptions on the true Brenier potentials, and (ii) an oracle inequality that cleanly decomposes the estimation error into statistical error, sieved bias, and approximation error. Specifically, our approximation error is measured in the L1 norm rather than Sobolev norm in the existing results, aligning more naturally with classical approximation theory. Leveraging these tools, we provide statistical error of semi-dual estimators with mild and verifiable conditions on the true OT map. Moreover, we establish the first theoretical guarantee for deep neural network OT map estimator between general distributions, with Tanh network function class as an example.


Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points

Neural Information Processing Systems

Wasserstein gradient flow (WGF) is a common method to perform optimization over the space of probability measures. While WGF is guaranteed to converge to a first-order stationary point, for nonconvex functionals the converged solution does not necessarily satisfy the second-order optimality condition; i.e., it could converge to a saddle point. In this work, we propose a new algorithm for probability measure optimization, perturbed Wasserstein gradient flow (PWGF), that achieves second-order optimality for general nonconvex objectives. PWGF enhances WGF by injecting noisy perturbations near saddle points via a Gaussian process-based scheme. By pushing the measure forward along a random vector field generated from a Gaussian process, PWGF helps the solution escape saddle points efficiently by perturbing the solution towards the smallest eigenvalue direction of the Wasserstein Hessian. We theoretically derive the computational complexity for PWGF to achieve a second-order stationary point. Furthermore, we prove that PWGF converges to a global optimum in polynomial time for strictly benign objectives.


Sample Complexity of Transfer Learning: An Optimal Transport Approach

arXiv.org Machine Learning

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(α+1)/d})$, with $α$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.


On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

arXiv.org Machine Learning

Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.


Sliced Wasserstein Steering between Gaussian Measures

arXiv.org Machine Learning

Optimal transport with quadratic cost provides a geometric framework for steering an ensemble, modeled by a probability law, with minimal effort. Yet ambient-space formulations become unwieldy in high dimensions, and sensing or actuation in practice often reveals only linear views of the state -- camera silhouettes, LiDAR beams, tomographic slices. We develop a sliced feedback controller for distribution steering: the evolving law is projected onto one-dimensional directions on the sphere, the optimal one-dimensional velocity is synthesized in each projection, and these velocities are averaged to produce a feedback control in the ambient space. The construction reduces to the Benamou--Brenier problem in one dimension. In addition, it is invariant under orthogonal transforms, nonexpansive under projections, and well posed on $\mathcal{P}_2(\mathbb{R}^n)$. Computation proceeds by sampling directions on the sphere and solving independent one-dimensional subproblems, yielding a scalable method aligned with partial observations. In the Gaussian setting, we show that the developed sliced controller steers the law to the prescribed target. Furthermore, we derive an identity relating the energy consumption incurred by the controller to the sliced Wasserstein distance.